Python for Bioinformatics

This Jupyter notebook is intented to be used alongside the book Python for Bioinformatics

Note: Before opening the file, this file should be accesible from this Jupyter notebook. In order to do so, the following commands will download these files from Github and extract them into a directory called samples.

Chapter 14: Graphics in Python

USING BOKEH


In [1]:
!curl https://raw.githubusercontent.com/Serulab/Py4Bio/master/samples/samples.tar.bz2 -o samples.tar.bz2
!mkdir samples
!tar xvfj samples.tar.bz2 -C samples


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 16.5M  100 16.5M    0     0  16.6M      0 --:--:-- --:--:-- --:--:-- 16.6M
BLAST_output.xml
TAIR7_Transcripts_by_map_position.gz
pMOSBlue.txt
fishbacteria.csv
UniVec_Core.nsq
t3beta.fasta
PythonU.db
input4align.dnd
pdb1apk.ent.gz
readme.txt
contig1.ace
example.aln
hsc1.fasta
bioinfo/seqs/15721870.fasta
primers.txt
bioinfo/seqs/4586830.fasta
bioinfo/seqs/7638455.fasta
GSM188012.CEL
3seqs.fas
sampleX.fas
sampleXblast.xml
B1.csv
phd1
conglycinin.phy
bioinfo/seqs/218744616.fasta
spfile.txt
bioinfo/seqs/513419.fasta
bioinfo/seqs/513710.fasta
prot.fas
cas9align.fasta
seqA.fas
bioinfo/seqs/
bioinfo/
pdbaa
other.xml
vectorssmall.fasta
t3.fasta
a19.gp
data.csv
input4align.fasta
B1IXL9.txt
fasta22.fas
bioinfo/seqs/7415878.fasta
bioinfo/seqs/513718.fasta
bioinfo/seqs/513719.fasta
bioinfo/seqs/6598312.fasta
UniVec_Core.nin
Q5R5X8.fas
bioinfo/seqs/513717.fasta
BcrA.gp
bioinfo/seqs/2623545.fasta
bioinfo/seqs/63108399.fasta
conglycinin.dnd
NC2033.txt
fishdata.csv
uniprotrecord.xml
BLAST_output.html
Q9JJE1.xml
test3.csv
UniVec_Core.nhr
sampledata.xlsx
UniVec_Core
NC_006581.gb
conglycinin.multiple.phy
conglycinin.fasta

Listing 14.1: basiccircle.py: A circle made with Bokeh


In [2]:
from bokeh.plotting import figure, output_file, show

p = figure(width=400, height=400)
p.circle(2, 3, radius=.5, alpha=0.5)
output_file("out.html")
show(p)

Listing 14.2: fourcircles.py: 4 circles made with Bokeh


In [3]:
from bokeh.plotting import figure, output_file, show

p = figure(width=500, height=500)
x = [1, 1, 2, 2]
y = [1, 2, 1, 2]
p.circle(x, y, radius=.35, alpha=0.5, color='red')
output_file("out.html")
show(p)

Listing 14.3: plot1.py: A minimal plot


In [4]:
from bokeh.plotting import figure, output_file, show

x = [1, 2, 3, 4, 5, 6, 7, 8]
y = [.7, 1.4, 2.1, 3, 3.85, 4.55, 5.8, 6.45]

p = figure(title='Mean wt increased vs. time',
           x_axis_label='Time in days',
           y_axis_label='% Mean WT increased')
p.circle(x, y, legend='Subject 1', size=10)
output_file('test.html')
show(p)

Listing 14.4: plot2.py: Two data series plot


In [5]:
from bokeh.plotting import figure, output_file, show

x = [1, 2, 3, 4, 5, 6, 7, 8]
y = [.7, 1.4, 2.1, 3, 3.85, 4.55, 5.8, 6.45]
z = [.5, 1.1, 1.9, 2.5, 3.1, 3.9, 4.85, 5.2]

p = figure(title='Mean wt increased vs. time',
           x_axis_label='Time in days',
           y_axis_label='% Mean WT increased')
p.circle(x, y, legend='Subject 1', size=10)
p.circle(x, z, legend='Subject 2', size=10, line_color='red',
         fill_color='white')
p.legend.location = 'top_left'
output_file('test.html')
show(p)

Listing 14.5: fishpc.py: Scatter plot


In [3]:
from bokeh.charts import Scatter, output_file, show
from pandas import DataFrame

df = DataFrame.from_csv('samples/fishdata.csv')

scatter = Scatter(df, x='PC1', y='PC2', color='feeds',
        marker='species', title=
        'Metabolic variations based on 1H NMR profiling of fishes',
        xlabel='Principal Component 1: 35.8%',
        ylabel='Principal Component 2: 15.1%')
scatter.legend.background_fill_alpha = 0.3
output_file('scatter.html')
show(scatter)

Listing 14.6: heatmap.py: Plot a gene expression file


In [4]:
from bokeh.charts import HeatMap, bins, output_file, show
import pandas as pd

DATA_FILE = 'samples/GSM188012.CEL'
dtype = {'x': int, 'y': int, 'lux': float}
dataset = pd.read_csv(DATA_FILE, sep='\t', dtype=dtype)
hm = HeatMap(dataset, x=bins('x'), y=bins('y'), values='lux',
             title='Expression', stat='mean')
output_file("heatmap7.html", title="heatmap.py example")
show(hm)

Listing 14.7: chord.py: A Chord diagram


In [5]:
from bokeh.charts import output_file, Chord
from bokeh.io import show
import pandas as pd
data = pd.read_csv('samples/test3.csv')
chord_from_df = Chord(data, source='name_x', target='name_y',
                      value='value')
output_file('chord.html')
show(chord_from_df)